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Abstract 

Turbulence  near  hot  surfaces  such  as  desert  terrains  and 
roads  during  the  summer,  causes  shimmering,  distortion 
and  blurring  in  images.  While  recent  works  have  focused 
on  image  restoration,  this  paper  explores  what  information 
about  the  scene  can  be  extracted  from  the  distortion  caused 
by  turbulence.  Based  on  the  physical  model  of  wave  prop¬ 
agation,  we  first  study  the  relationship  between  the  scene 
depth  and  the  amount  of  distortion  caused  by  homogenous 
turbulence.  We  then  extend  this  relationship  to  more  prac¬ 
tical  scenarios  such  as  finite  extent  and  height-varying  tur¬ 
bulence,  and  present  simple  algorithms  to  estimate  depth 
ordering,  depth  discontinuity  and  relative  depth,  from  a 
sequence  of  short  exposure  images.  In  the  case  of  gen¬ 
eral  non-homo genous  turbulence,  we  show  that  a  statistical 
property  of  turbulence  can  be  used  to  improve  long-range 
structure -from-motion  (or  stereo).  We  demonstrate  the  ac¬ 
curacy  of  our  methods  in  both  laboratory  and  outdoor  set¬ 
tings  and  conclude  that  turbulence  ( when  present )  can  be  a 
strong  and  useful  depth  cue. 

1.  Introduction 

The  visual  manifestations  of  clear  air  turbulence  occur 
often  in  our  daily  lives  —  from  hot  kitchen  appliances  like 
toasters  and  ovens,  to  plumes  of  airplanes,  to  desert  terrains, 
to  roads  on  hot  summer  days,  to  the  twinkling  of  stars  at 
night.  The  shimmering  and  distortion  observed  are  caused 
by  random  fluctuations  of  temperature  gradients  near  warm 
surfaces.  In  this  case,  the  image  projection  of  a  scene  point 
viewed  through  turbulence  is  no  longer  a  deterministic  pro¬ 
cess,  and  often  leads  to  poor  image  quality. 

Several  works  in  remote  sensing  and  astronomical  imag¬ 
ing  have  focused  on  image  correction  through  turbulence. 
For  atmospheric  turbulence,  the  distorted  wavefronts  arriv¬ 
ing  from  stars  can  be  optically  corrected  using  precisely 
controlled  deforming  mirror  surfaces,  beyond  the  angular 
resolution  limit  of  telescopes  [16].  For  terrestrial  imaging 
applications,  recent  works  have  proposed  to  digitally  post¬ 
process  the  captured  images  to  correct  for  distortions  and  to 
deblur  images  [6,  3,  4,  9,  26].  Optical  flow  based  methods 
have  been  used  further  to  register  the  image  sequences  to 
achieve  modest  super-resolution  [18]. 

While  previous  works  have  focused  on  what  turbulence 
does  to  vision,  this  article  addresses  the  question  of  what 


Figure  1 .  Random  fluctuations  in  the  refractive  index  of  a  medium 
cause  the  perturbation  of  a  light  wave  radiating  from  a  scene  point. 
The  resulting  image  projections  of  the  scene  point  over  time  are 
also  random.  The  longer  the  distance  of  a  scene  point  from  the 
camera,  the  greater  the  variance  of  its  image  projection. 

turbulence  can  do  for  vision.  In  other  words,  what  informa¬ 
tion  about  the  scene  can  be  extracted  when  viewed  through 
turbulence?  Based  on  the  physical  model  of  wave  propa¬ 
gation,  we  study  the  relationship  between  the  scene  depth 
and  the  amount  of  distortion  caused  by  homogenous  tur¬ 
bulence  over  time  (see  an  intuitive  illustration  in  Fig  1). 
Then,  we  extend  this  relationship  to  more  practical  scenar¬ 
ios  of  finite  extent  and  height- varying  turbulence,  and  show 
how  and  in  what  scenarios  we  can  estimate  depth  ordering, 
depth  discontinuity  and  relative  depths.  Although  general 
non-homogenous  turbulence  does  not  directly  yield  depth 
information,  its  statistical  property  can  be  used  along  with  a 
stereo  camera  pair  to  improve  long-range  depth  estimation. 

The  input  to  our  techniques  is  a  sequence  of  short  expo¬ 
sure  images  captured  from  a  stationary  camera  (or  camera 
pair).  Depth  cues  are  obtained  by  first  tracking  image  fea¬ 
tures  and  then  by  computing  the  variances  of  tracker  dis¬ 
placements  over  time.  Any  feature  tracking  algorithm  can 
be  applied,  such  as  that  based  on  template  matching.  We 
verify  our  approaches  in  both  laboratory  and  outdoor  set¬ 
tings  by  comparing  against  known  (ground  truth)  distances 
of  the  scene  from  the  camera.  We  also  analyze  how  the 
depth  cue  estimation  is  influenced  by  the  parameters  of  the 
imaging  system,  such  as  aperture,  exposure  time  and  the 
number  of  frames.  The  depth  information  computed  is  sur¬ 
prisingly  accurate,  even  when  the  scene  and  camera  are  not 
within  the  turbulent  region.  Hence,  we  believe  that  turbu¬ 
lence  should  not  be  only  viewed  as  “noise”  that  an  imaging 
system  must  overcome,  but  also  as  an  additional  source  of 
information  about  the  scene  that  can  be  readily  extracted1 . 

1  While  not  the  focus  of  this  work,  the  short  exposure  (noisy)  and  dis¬ 
torted  input  images  can  be  combined  using  a  dense  image  alignment  ap¬ 
proach  [21]  to  improve  image  quality  (see  Supplementary  material). 
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1.1.  Related  Work 


Characterizing  the  structure  of  turbulence  is  one  of  the 
open  problems  in  physics,  with  a  long  research  history, 
starting  from  the  early  methods  of  Kolmogorov  [10].  For 
this  work,  we  reference  multiple  textbooks  by  Kopeika  [12], 
Tatarskii  [20],  Ishimaru  [£]  and  Roggemann  [16].  To  our 
knowledge,  the  key  physical  model  (Eqn.  3)  in  these  texts 
has  not  been  exploited  by  the  computer  vision  community. 

Direct  measurement  of  turbulent  media  has  received 
much  attention  in  fluid  dynamics.  Shadowgraph  and 
Schlieren  imaging  [17,  24]  techniques  are  often  used  to  cap¬ 
ture  the  complex  airflow  around  turbines,  car  engines  and 
airplanes  wings.  Image  displacement  observed  in  turbulent 
media  has  been  shown  to  be  proportional  to  the  integral  of 
the  refractive  index  gradient  field.  This  property  is  exploited 
in  a  tomographic  approach  [5]  with  many  views  to  com¬ 
pute  the  density  field  of  the  medium  from  image  displace¬ 
ments  of  known  backgrounds.  This  approach,  called  Back¬ 
ground  Oriented  Schlieren  (BOS)  imaging  [23,  22,  15],  has 
emerged  as  a  new  technique  for  flow  visualization  of  den¬ 
sity  gradients  in  fluids.  Such  approaches  have  also  been 
used  to  render  refractive  volumes  of  gas  flow  [2].  Similarly, 
there  has  been  work  [13]  that  aims  to  estimate  the  shape  of  a 
curvy  refractive  interface  between  two  media  (water  and  air, 
for  example)  using  stereo  and  known  backgrounds.  In  con¬ 
trast,  our  work  exploits  image  displacements  to  extract  the 
depths  cues  of  an  unknown  scene  using  an  image  sequence 
captured  from  a  single  viewpoint. 

2.  Characterization  of  Turbulence 

Turbulence  causes  random  fluctuations  of  the  refractive 
index  n(r,  t)  at  each  location  r  in  the  medium  and  at  time 
t.  From  Kolmogorov’s  seminal  work  [10,  11],  n(r,  t)  forms 
a  random  field  in  space-time  and  can  be  characterized  by  a 
structure  function  D(ri,r2,£)  that  computes  the  expected 
squared  difference  of  refractive  index  at  two  distinct  spatial 
locations  ri  and  r2: 

D(ri,r2,t)  =  (|n(ri,t)  -  n(r2,t)\2) .  (1) 

For  stationary  turbulence,  the  structure  function  is  constant 
over  t ,  i.e.,  D(ri,  r2,  t)  =  D(i*i,  r2).  Stationary  turbulence 
is  homogeneous  if  D(ri,  r2)  =  D( r),  where  r  =  1*1  —  r2. 
This  means  that  the  structure  function  depends  only  on  the 
relative  displacement  of  the  locations.  Homogenous  tur¬ 
bulence  is  isotropic  if  the  structure  function  is  spherically 
symmetric,  i.e.,  D(r)  =  D(r),  where  r  =  ||r||.  From  di¬ 
mensional  analysis,  Kolmogorov  shows  that  the  structure 
function  follows  a  2/3  power  law  [20] : 

D(r)  =  C2nr2/3,  (2) 

where,  the  constant  C2  reflects  the  strength  of  turbulence. 
For  non-homogeneous  turbulence,  C2  is  a  function  of  abso¬ 
lute  location.  A  non-turbulent  atmosphere  has  C2  =  0. 


Figure  2.  The  phase  difference  of  an  incident  wave  ( e.g .,  the  phase 
of  point  B  leads  that  of  A )  at  the  aperture  determines  the  angle- 
of-arrival  a  (AoA)  and  in  turn,  the  center  of  the  diffraction  kernel, 
i.e.,  the  location  of  the  projected  scene  point  in  the  image  plane. 

In  general,  the  strength  C 2  of  turbulence  depends  on  a 
variety  of  environmental  and  physical  factors,  such  as  tem¬ 
perature,  pressure,  humidity  and  wavelength  of  light,  which 
in  turn  depend  on  the  time  of  day  (less  during  sunset  and 
sunrise,  more  at  mid-day),  cloud  cover  (less  during  cloudy 
day  and  more  during  cloudy  nights),  and  wind  patterns.  An 
empirical  relationship  between  these  factors  and  refractive 
index  changes  can  be  found  in  Kopeika’s  textbook  [12]. 

3.  Image  Formation  through  Turbulence 

When  an  electromagnetic  wave  propagates  through  a  tur¬ 
bulent  medium,  it  undergoes  random  fluctuations  in  both 
amplitude  and  phase.  The  perturbed  phase  determines  the 
angle-of-arrival  (AoA)  of  the  light  incident  at  the  camera, 
which  in  turn  fixes  the  projected  location  of  the  scene  point 
in  the  image  (Fig.  2).  Mathematically,  the  propagation  of  an 
electric  field  under  the  influence  of  the  turbulence  structure 
function  in  Eqn.  2  can  be  obtained  by  solving  Maxwell’s 
equations.  Since  most  surfaces  and  sources  of  interest  to 
computer  vision  are  at  finite  distances  from  the  camera  and 
produce  divergent  waves,  we  will  consider  the  propaga¬ 
tion  of  spherical  waves.  Then,  following  the  derivations 
in  [12,  8],  the  variance  (a2)  of  the  angles-of-arrival  of  the 
waves  from  a  scene  point  at  distance  L  from  the  camera  is 
obtained  by  integrating  along  the  line  of  sight: 

(a2)  =  2.914 D-1'3  £  C2(z)  dz,  (3) 

where,  D  is  the  diameter  of  the  aperture.  The  actual  fluctu¬ 
ation  (S2)  of  the  projected  image  location  can  be  computed 
using  the  relation  8  =  /  tan  a  where,  /  is  the  focal  length 
of  the  camera.  For  small  angles,  8  &  fa. 

In  the  following,  we  will  discuss  three  important  special 
cases  of  the  above  image  formation  model.  We  will  ad¬ 
dress  the  general  case  of  non-homogeneous  turbulence  in 
Section  7.  First,  consider  a  scenario  where  both  the  camera 
and  the  scene  of  interest  are  immersed  in  a  homogeneous 
turbulence  medium  (for  example,  a  road  scene  with  vehi¬ 
cles  on  a  hot  summer  day),  as  illustrated  in  Fig.  3(a).  Since 
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Figure  3.  Image  formation  through  turbulence,  (a):  Both  the  cam¬ 
era  and  the  scene  are  immersed  within  a  homogeneous  turbulence 
region,  (b):  The  camera  and/or  scene  are  outside  the  turbulence 
region. 


Figure  4.  The  variance  of  the  angle-of-arrival  predicted  by  Eqn.  6 
under  different  experimental  settings.  For  each  curve,  the  cam¬ 
era  and  the  extent  of  turbulence  (Lc  and  Lt)  are  fixed,  while  the 
scene  depth  ( Ls )  is  varied.  Each  curve  represents  a  monotonically 
increasing  function  of  scene  depth  that  converges  to  a  fixed  vari¬ 
ance  (at  infinity).  The  dashed  black  line  is  the  linear  relation  in  the 
special  case  when  Lc  —  Ls  —  0. 


C%  is  a  constant,  we  can  integrate  Eqn.  3  to  obtain: 

(a2)  =  2.914D~1/3C2  j  (^j5/3  dz 

=  \KlL,  (4) 

where,  =  2.914 D~XIZC\.  So,  the  variance  of  pro¬ 
jected  positions  of  the  scene  point  in  the  image  plane  over 
time  is  directly  proportional  to  the  distance  L  between  the 
scene  point  and  the  camera.  Setting  aside  the  issue  of  spa¬ 
tial  resolution,  this  linear  relationship  determines  depth  with 
constant  precision  for  all  distances  within  the  turbulence  re¬ 
gion.  By  comparison,  in  stereo,  the  depth  precision  falls  as 
the  square  of  the  distance  from  the  camera  to  the  scene. 

In  many  scenarios,  like  the  plume  of  an  aircraft  or 
a  steaming  kettle,  the  source  of  turbulence  may  not  ex¬ 
tend  over  the  entire  line-of- sight  from  the  camera  to  the 
scene.  In  this  case,  we  will  assume  local  homogeneity, 
i.e„  C2n  is  a  constant  within  a  short  range  and  zero  else¬ 
where.  For  convenience,  we  decompose  L  into  three  parts: 
L  =  Ls  +  Lt  +  Lc ,  as  illustrated  in  Fig.  3.  Ls  is  the  distance 
between  the  scene  point  and  the  turbulence  region,  Lt  is  the 
path  length  within  the  turbulence  region  and  Lc  is  the  dis¬ 
tance  between  the  camera  and  the  turbulence  region.  Once 
again,  we  can  integrate  Eqn.  3  to  obtain  the  analytic  form: 


(a2)  = 


K2 
L5/ 3 


rLt  +  Ls 


?y3dz 


=  Ki 


8£5/3 


((Lt  +  Lsf'3  -  L*/3)  .  (5) 


Letting  (optative)  =  ^  ((Lt  +  Lsf/3  -  I*J 3) 

us  to  write  in  short: 


allows 


(a2)  =  Kl  (Relative)  •  (6) 


If  we  fix  the  camera  location  Lc  and  the  turbulence  region 
Lt ,  and  move  the  scene  point  away  from  the  camera,  the 
variance  is  a  monotonically  increasing  function  with  respect 
to  L,  as  shown  in  Fig.  4.  From  this,  we  observe  that  the  vari¬ 
ance  increases  even  if  the  scene  moves  away  from  the  turbu¬ 
lence  region.  This  is  a  counter-intuitive  result  that  cannot  be 
explained  by  ray  optics  (hence,  the  usage  of  “waves”  in  this 
article).  The  variance,  however,  converges  to  a  fixed  value 
(a^),  when  the  scene  point  is  infinitely  far  away  from  the 
camera  (e.g.,  a  distant  star).  This  can  be  seen  by  taking  the 
limit  Ls  — >>  oc  in  Eqn.  6  to  obtain: 

(a°°)  =  L^“oo  8LV3  ((Lt  +  is)8/3  “  i8/3) 

=  Klu.  (7) 

In  this  case,  the  light  emitted  by  the  scene  point  can  be  mod¬ 
eled  as  a  plane  wave. 

Height- varying  turbulence.  In  practice,  the  air  turbulence 
may  not  be  homogenous  in  the  entire  field  of  view.  For 
example  on  an  asphalted  road,  the  turbulence  is  more  severe 
near  the  road  surface  than  away  from  it.  We  model  this 
effect  by  writing  the  strength  of  turbulence  as  a  smoothly 
varying  function  of  height  h ,  yielding  a  separable  model: 

(a2)  =  K2n(h)(a  r2elative).  (8) 

Typically,  C^(h)  (or  K%(h))  decreases  with  respect  to  h. 

4.  Depth  cues  from  an  Image  Sequence 

In  this  section,  we  investigate  what  depth  cues  can  be  ob¬ 
tained  from  the  observed  variance  of  image  displacements 
of  scene  points.  The  input  to  all  our  algorithms  is  a  se¬ 
quence  of  images  of  a  stationary  scene  viewed  through  tur¬ 
bulence  by  a  fixed  camera.  Once  the  images  are  captured, 
we  track  a  sparse  set  of  distinctive  feature  points  on  the 
scene.  While  any  feature-tracking  algorithm  may  be  used, 
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Figure  5.  Experimental  setup:  Three  adjacent  electric  cooking 
griddles  are  heated  up  to  400  degrees  Fahrenheit  to  create  hot  air 
turbulence.  A  camera  observes  a  scene  through  the  turbulence. 
By  varying  the  temperature,  we  can  emulate  a  variety  of  outdoor 
turbulence  strengths  and  path-lengths  of  several  kilometers. 

we  adopt  a  simple  frame-to-frame  template  matching  ap¬ 
proach.  To  handle  image  blurring  caused  by  turbulence,  we 
also  add  blurred  versions  of  the  templates.  The  variance  of 
the  image  location  of  each  tracked  point  is  then  computed. 

For  a  fixed  configuration  of  camera  and  extent  of  the 
homogeneous  turbulence  region,  the  model  (Eqn.  6)  is  a 
monotonic  smooth  function  of  scene  depth.  Thus,  both 
depth  ordering  and  discontinuities  (like  two  buildings  far 
apart)  of  the  scene  can  be  readily  obtained  from  variances. 
In  particular,  detecting  such  discontinuities  can  be  useful  to 
segment  the  scene  into  different  depth  layers  (planes). 

On  the  other  hand,  a  more  quantitative  measurement, 
such  as  relative  depth  between  scene  points,  requires  ad¬ 
ditional  assumptions.  Note  that  absolute  depth  cannot  be 
computed  without  knowing  the  turbulence  strength,  C%- 
Thus,  without  loss  of  generality,  we  will  assume  Lt  =  1. 
When  the  camera  and  scene  are  immersed  in  turbulence 
(. Lc  =  Ls  =  0),  the  linear  variance-depth  relationship 
(Eqn.  4)  allows  us  to  obtain  relative  depth  by  taking  vari¬ 
ance  ratios  to  eliminate  the  unknown  constant  K\ .  In  gen¬ 
eral,  if  Lc ,  Ls  and  K %  are  known,  depth  can  be  obtained 
by  inverting  Eqn.  6.  By  monotonicity  of  the  model,  only  a 
unique  depth  can  be  obtained  from  a  given  variance. 

However,  in  practice,  these  constants  are  usually  un¬ 
known.  Thus,  for  N  scene  points  we  have  N  +  3  unknowns 
(N  depths  plus  Lc ,  Ls  and  K %)  but  N  equations.  Consider  a 
scene  with  repetitive  patterns  (windows  on  a  building,  street 
lamps,  cars  parked  on  a  street),  then  the  depths  {Li}f=1  of 
the  N  points  follow  an  arithmetic  sequence: 

U  =  L0  +  iAL  (9) 

Thus,  N  depths  { Li }^=1  are  parameterized  by  2  variables, 
L0  and  A L.  As  a  result,  only  2  +  3  =  5  scene  points  suffice 
to  estimate  (using  numerical  optimization)  both  the  relative 
depths  and  the  extent  of  the  turbulence  region. 

In  the  case  of  height- varying  turbulence,  we  need  to  es¬ 
timate  the  height- varying  function  K^(h)  as  well  as  the 
scene  depth.  Fortunately,  if  the  height  is  aligned  with  the 
y-axis  of  the  image,  then  separating  depth  from  height  can 
be  achieved  by  treating  each  scan-line  individually. 


Ground  truth  LED  depth 

Figure  6.  Left:  LEDs  are  immersed  in  the  turbulence  region. 
Right:  The  relationship  between  the  variance  of  LED  projections 
and  their  ground  truth  depths  is  very  close  to  linear  (correlation 
coefficient  is  0.987),  and  is  consistent  with  the  model  (Eqn.  6). 


5.  Laboratory  Experiments 

We  performed  several  experiments  in  a  controlled  lab¬ 
oratory  environment  to  validate  the  theory.  A  flat  cooking 
griddle  of  size  52cm  x  26cm  is  used  to  produce  and  main¬ 
tain  uniform  heat  of  up  to  400  degrees  Fahrenheit,  across  the 
flat  surface.  In  the  experiment  setup  (Fig.  5),  multiple  such 
griddles  are  placed  side  by  side  to  increase  the  path-length 
of  turbulence.  The  three  griddles  set  at  maximum  tempera¬ 
ture  produce  roughly  the  same  shimmering  as  a  kilometer  of 
natural  turbulence  in  the  desert.  By  controlling  the  number 
of  griddles  and  the  temperature,  a  wide  range  of  turbulence 
strengths  seen  outdoors  can  be  emulated.  In  all  experiments, 
variances  are  estimated  by  capturing  a  20-30  seconds  long 
video  sequence  of  the  scene  at  30  fps. 

5.1.  Quantitative  Evaluation 

Variance-depth  linearity  within  turbulence  region.  50 

equally-spaced  LEDs  are  placed  5  cm  above  the  hot  griddle 
(in  the  turbulence  region).  One  end  of  the  stick  is  closer 
to  the  camera  while  the  other  is  farther  away.  Fig.  6  shows 
the  variances  computed  for  each  LED  projection  onto  the 
image  plane  averaged  over  3  experimental  trials.  Consistent 
with  the  model  (Eqn.  6)  when  Ls  =  Lc  =  0,  indeed  the 
relationship  between  the  depth  (represented  by  the  indices 
of  LED)  and  the  variances  is  linear  with  a  high  correlation 
coefficient  of  0.987.  A  similar  experiment  that  estimates  the 
depth  of  a  curvy  line  on  a  sphere  is  also  shown  in  Fig.  7. 

Identifying  depth  discontinuity.  In  this  experiment,  we 
place  two  checker-board  patterns  vertically  at  two  distinct 
depths  (Lnear  and  Lfar)  and  measure  variances  of  the  key 
points  on  the  scene.  We  conducted  four  experiments  with 
different  settings  of  Lnear  and  Lfar  (Table  1).  All  were  cap¬ 
tured  in  the  same  setting  of  // 11  with  exposure  1/2000, 
while  the  zoom  was  varied  to  include  the  entire  scene  within 
the  field  of  view.  Fig.  8  illustrates  the  variance  discontinuity 
by  two  separate  parametric  fittings  of  the  key  points  on  two 
checker-boards  in  one  experiment.  Clearly,  the  depth  dis¬ 
continuity  can  be  detected  from  the  variance  discontinuity. 

Validation  of  the  physics  model  (Eqn.  6).  As  shown  in 
Fig.  8,  due  to  height  variations  of  the  turbulence,  the  vari- 


Figure  8.  Experiments  with  two  planar  checker-boards  placed  at  different  distances  from  the  camera.  Due  to  space  limit,  we  only  show  1 
(Exp.2)  of  the  4  experiments,  and  leave  the  remaining  in  the  supplementary  material.  The  experimental  setting  can  be  found  in  Table  1. 
The  first  column  shows  a  sample  distorted  frame,  the  second  and  third  columns  show  two  views  of  the  variance  distribution  of  the  corners 
of  the  checker-boards.  From  the  figures,  variances  changes  due  to  depth  discontinuity  and  height  is  obvious.  We  detect  the  discontinuity 
and  fit  smooth  surfaces  to  the  variances.  The  ratio  of  variances  of  the  two  depth  planes  are  then  computed  and  quantitatively  compared  to 
the  ground  truth  (Table  1). 


Figure  9.  Depth  estimation  of  equally  spaced  points  on  an  inclined  planar  scene.  A  sample  distorted  frame  due  to  turbulence  is  shown  on 
the  left.  On  the  right,  are  two  views  of  the  variance  distribution  of  a  sparse  set  of  key  points  and  a  smooth  function  fit  illustrating  the  near 
planar  geometry.  From  this,  it  is  possible  to  predict  the  relative  depths  of  the  scene  points  (assuming  the  length  of  the  turbulence  region  to 
be  1).  For  evaluation,  the  relative  depth  estimated  is  converted  to  actual  depth  by  using  the  actual  length  (196cm)  of  turbulence  region.  The 
estimated  slope  of  the  plane  is  0.517  cm  per  10  pixels  in  horizontal  direction,  compared  to  the  ground  truth  value  0.529  cm  per  10  pixels. 


Figure  7.  Ellipse  fitting  on  a  video  sequence  capturing  a  curve  on 
the  sphere  through  turbulence.  Ideally  the  projection  of  the  LEDs 
forms  an  ellipse  on  the  image  plane.  Left:  A  sample  frame  of  the 
captured  video  sequence.  Right:  Average  error  in  fitting  is  12.6%. 
The  average  fitting  error  between  a  covariant  x  and  dependent 
variable  y  is  computed  using  -  yij^/^/TJS  ~  2/i)2, 

where  y\  is  the  fitted  value  of  point  Xi  and  y  is  the  mean  of  y. 


ance  changes  smoothly  over  the  y- axis.  However,  the  vari¬ 
ance  ratio  computed  by  two  points  on  two  checker-boards 
at  the  same  scan-line  is  independent  of  height  h,  amount 
of  turbulence  C ®  and  aperture  diameter  D.  On  the  other 
hand,  we  can  compute  the  theoretical  variance  ratios  using 
the  ground  truth  value  of  L,  Lc ,  Lt  and  model  Eqn.  6,.  The 
measurement  is  consistent  with  the  theory,  validating  the 
model  in  all  four  settings  (Table  1)  that  covers  both  cases 
where  the  scene  is  within  and  outside  the  turbulence  region. 


No. 

Lc 

Lt 

near 

L±ar 

Measured 

Predicted 

Expl 

54 

171 

163 

225 

1.77 

1.79 

Exp2 

74 

173 

183 

382 

3.60 

3.52 

Exp3 

74 

173 

247 

382 

1.67 

1.94 

Exp4 

74 

173 

247 

320 

1.56 

1.55 

Table  1.  Columns  1-4  show  the  ground  truth  measurement  (in  cen¬ 
timeters)  for  the  four  checker-board  experiments.  Columns  5-6 
show  the  comparison  between  the  measured  (5th  column)  vari¬ 
ance  ratio  and  that  predicted  by  the  model  ( 6th  column).  In  all  but 
one  case,  the  measurements  are  very  accurate. 

Depth  estimation  of  equally  spaced  scene  points.  We  es¬ 
timate  the  depths  of  equally  spaced  points  using  the  method 
in  Section  4.  A  horizontally  slanted  plane  with  a  texture 
of  a  building  facade  is  placed  behind  the  turbulence  region. 
Fig.  9  shows  the  two  views  of  the  computed  variances  at  key 
points  and  the  surface  fits  that  demonstrate  the  near  planar 
geometry.  Assuming  Lt  =  1  (length  of  turbulence  region), 
relative  depths  of  scene  points  can  be  computed.  For  valida¬ 
tion,  the  estimated  relative  depths  are  converted  to  absolute 
ones  using  the  actual  length  ( Lt  =  196  cm)  of  the  turbu¬ 
lence  region.  The  depth  slope  of  the  plane  (A  L  in  Section  4) 
is  estimated  as  0.517  cm  per  10  pixels  in  the  horizontal  di¬ 
rection,  compared  to  the  ground  truth  value  of  0.529  cm  per 
10  pixels.  Please  see  the  supplementary  material  for  videos. 


160  meters  to  target 


(a)  Number  of  frames 


(b)  Exposure  (seconds) 


Figure  10.  Influence  of  imaging  parameters  on  the  variance  of  the 
corners  of  the  checker-board  pattern,  (a)  The  variance  estimates 
converge  with  sufficient  frames,  showing  that  the  turbulence  is  sta¬ 
tionary  during  measurement,  (b)  The  measured  variance  is  similar 
for  different  exposures,  except  for  very  long  exposures  where  the 
tracking  performance  degrades  due  to  motion  blur,  (c)  Consistent 
with  the  model,  the  measured  variance  decreases  significantly  with 
aperture  size,  (d)  Insufficient  image  resolution  results  in  much 
lower  variance  estimation. 


5.2.  Influence  of  Imaging  Parameters 

The  accuracy  of  the  measured  variance  depends  on  many 
imaging  parameters.  Larger  aperture  reduces  the  depth-of- 
field,  higher  exposure  time  adds  unwanted  motion  blur,  and 
low  image  magnification  causes  greater  quantization  of  the 
variance.  Here  we  present  an  empirical  study. 

The  estimate  of  the  variance  converges  as  the  number  of 
captured  frames  increases.  Fig.  10(a)  shows  the  variance  of 
the  10  key  points  in  Exp.l  with  different  numbers  of  frames 
used.  In  our  experiments,  stable  estimates  are  achieved  us¬ 
ing  frames  captured  over  30  seconds  using  a  30  fps  cam¬ 
era.  To  study  the  effects  of  aperture,  we  vary  the  //#  from 
//3. 7  to  // 11,  fix  the  exposure  time  at  1/8000^,  and  zoom 
at  the  highest  level.  Fig.  10(c)  shows  a  significant  decrease 
in  variance  as  predicted  by  the  model  in  Eqn.  6.  The  plot  in 
Fig.  10(b)  shows  the  variances  computed  by  changing  only 
the  exposure  time.  Fig.  10(d)  shows  the  effect  of  varying 
focal-length.  Here  we  normalize  the  variances  by  the  pixel 
size  of  the  checker-board  patterns  to  remove  the  effects  of 
image  magnification.  In  these  experiments,  we  have  tried 
to  maintain  the  same  noise  level  in  the  camera  by  maintain¬ 
ing  similar  image  brightness  (by  varying  illumination  inten¬ 
sity).  From  these  plots,  aperture  size  is  the  main  factor  that 
affects  the  estimation  quality.  However,  since  we  take  vari¬ 
ance  ratio  as  a  depth  measure,  the  effect  of  aperture  size  is 
reduced  (in  theory,  independent).  Also,  for  very  low  magni¬ 
fication  (spatial  resolution)  or  long  exposure  time  (l/30s), 
the  variance  estimate  shows  large  degradation. 

In  addition,  camera  shaking  can  cause  false  displace¬ 
ments  leading  to  poor  variance  estimates.  In  general,  this 
is  a  hard  problem  and  we  will  set  it  aside  for  future  work. 


110  meters  to  target 


Figure  11.  Sample  frames  of  the  outdoor  experiments  in  the  morn¬ 
ing  and  afternoon.  The  targets  are  placed  at  different  distances 
from  the  camera. 


6.  Outdoor  Experiments 

Besides  indoor  experiments  in  a  controlled  environment, 
we  also  conducted  experiments  outdoors  in  a  desert  region. 
The  imaging  setup  used  for  the  experiments  consists  of  a 
Prosilica  GC1380H  camera  and  a  Celestron  C6  Telescope. 
The  focal  length  is  1500mm.  A  one-to-one  ratio  optical  re¬ 
lay  is  used  between  the  telescope  and  the  camera,  without 
changing  the  focal  length  of  the  main  telescope.  We  placed 
two  standard  contrast  targets  110  meters  and  160  meters 
away  from  the  camera  and  captured  sequences  of  300-400 
images  during  mild  turbulence  (morning)  and  strong  turbu¬ 
lence  (afternoon).  We  used  a  30mm  aperture  in  the  morning 
and  a  10mm  aperture  in  the  afternoon,  and  varied  the  expo¬ 
sure  times  between  0.5ms  and  1ms. 

We  tracked  a  sparse  set  of  points  through  each  image 
sequence  and  rejected  outliers  such  as  the  static  trackers  of 
the  dirt  on  the  CCD  and  high-variance  erroneous  trackers 
near  locally  repetitive  textures.  The  computed  variances  of 
the  trackers  converge  quickly  (see  supplementary  material). 

Table  2  shows  the  mean  variance  computed  from  all  the 
trackers  of  each  image  sequence,  for  each  depth  and  imag¬ 
ing  setting.  Since  the  amount  of  variance  is  relatively  in¬ 
variant  to  exposure  change,  we  further  averaged  the  vari¬ 
ance  over  different  exposures.  If  we  take  the  variance  ra¬ 
tio  between  110m  and  160m,  we  obtain  1.6857  for  30mm 
aperture  in  the  morning  and  1.5785  for  10mm  aperture  in 
the  afternoon.  Both  are  close  to  the  ratio  of  two  distances 
160/110  =  1.4545,  verifying  the  dependence  of  turbulence 
model  on  the  depth.  Besides,  Table  2  also  shows  the  stan¬ 
dard  derivation  of  variances  in  each  video  sequence.  The 
low  standard  deviation  shows  that  the  turbulence  is  indeed 
homogenous  on  surfaces  that  are  perpendicular  to  the  op¬ 
tical  axis.  Note  we  do  not  consider  the  height  variation  of 
turbulence,  since  compared  to  the  laboratory  setting,  the  tar¬ 
get  occupies  a  much  narrower  field  of  view. 


Morning  Capture.  Target  llOmeters  away,  30mm  aperture. 

0.50ms  exposure 

0.75ms  exposure 

1.00ms  exposure 

4.32  =b  0.49 

4.89  ±  0.46 

4.79  ±  0.45 

Average:  4.67 

Morning  Capture.  Target  160meters  away,  30mm  aperture. 

0.50ms  exposure 

0.75ms  exposure 

1.00ms  exposure 

8.23  ±0.88 

7.61  ±  0.67 

7.75  ±  0.64 

Average:  7.86 

Afternoon  Capture.  10mm  aperture  and  5ms  exposure 

Target  llOmeters  away 

Target  160meters  away 

43.68  ±  7.50 

69.37  ±9.49 

Table  2.  Average  variance  (and  its  standard  derivation)  of  track¬ 
ers  for  outdoor  experiment.  The  variance  ratios  between  160m 
and  110m  turbulence  video  are  7.86/4.67=1.6857  (30mm  aper¬ 
ture  captured  in  the  morning)  and  69.37/43.68=1.5785  (10mm 
aperture  captured  in  the  afternoon),  close  to  the  distance  ratio 
(160m/110m=  1.4545),  verifying  our  model. 

7.  Jitter-stereo  in  Nonhomogenous  Turbulence 

Until  now,  we  have  addressed  depth  estimation  under  ho¬ 
mogenous  and  simple  height- varying  turbulence.  However, 
due  to  unpredictable  temperature  and  humidity  fluctuation, 
turbulence  cannot  be  guaranteed  to  be  homogenous  over  a 
large  area  and  a  long  period  of  time.  In  this  case,  it  is  im¬ 
possible  to  estimate  depth  using  the  model  without  knowing 
the  turbulence  structure  function.  Instead,  we  will  exploit  a 
general  statistical  property  of  turbulence  along  with  a  stereo 
camera  pair  to  improve  long-range  depth  estimation. 

Recall  that  binocular  stereo  estimates  the  depth  of  a 
scene  point  by  computing  scene  disparity  across  two  views. 
If  a  scene  point  is  far  away  compared  to  the  stereo  baseline, 
the  disparity  may  be  less  than  one  pixel  and  the  depth  es¬ 
timation  may  fail.  However,  in  the  presence  of  turbulence, 
the  location  of  the  scene  point  “jitters”  around  its  true  loca¬ 
tion  in  the  image.  But  how  do  we  estimate  the  true  location 
of  a  scene  point  without  turbulence?  It  has  been  observed 
that  distribution  of  (even  non-homogeneous)  turbulence  dis¬ 
tortions  is  close  to  zero-mean.  This  is  true  even  if  the  vari¬ 
ance  of  each  scene  point  is  different.  Thus,  the  mean  posi¬ 
tions  of  the  tracked  scene  points  are  their  most-likely  posi¬ 
tions  when  there  is  no  turbulence.  Furthermore,  estimating 
mean  locations  of  trackers  allows  us  to  obtain  the  dispar¬ 
ity  possibly  with  sub-pixel  accuracy,  helping  in  long-range 
depth  estimation  with  a  short  baseline.  This  approach  is  also 
similar  in  spirit  to  Swirski  et.  al  [1(  ]  that  estimates  corre¬ 
spondences  in  stereo  using  underwater  caustics  patterns. 

In  order  to  experimentally  verify  our  approach,  we  cap¬ 
tured  two  image  sequences  of  a  planar  scene  110m  away 
from  different  view  points,  with  baseline  of  less  than  lm. 
The  two  sequences  were  captured  at  different  times  when 
the  turbulence  was  significantly  different.  We  tracked  a 
common  sparse  set  of  scene  points  in  the  two  views  and 
compute  the  mean  locations.  We  also  verified  that  the 
distribution  of  tracker  displacements  are  zero-mean  in  all 
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Figure  12.  Distribution  of  x  and  y  displacement  of  trackers  in  out¬ 
door  image  sequence,  computed  over  all  trackers  and  all  frames 
from  an  image  sequence  in  the  afternoon.  Consistent  with  our  as¬ 
sumption,  they  follows  a  zero-mean  distribution. 


Figure  13.  Jitter  stereo  in  turbulence,  (a)  (Uncalibrated)  disparity 
computed  from  two  video  frames  at  a  certain  time,  due  to  turbu¬ 
lence,  the  disparity  is  noisy.  Over  the  319  frames,  the  correlation 
coefficient  varies  from  0.632  to  0.954  with  mean  being  0.884  and 
standard  derivation  being  0.054.  (b)  (Uncalibrated)  disparity  us¬ 
ing  mean  tracker  locations.  The  disparity  is  clearly  linear  (corre¬ 
lation  coefficient  is  0.976). 

our  experiments  (see  Fig.  12).  The  disparities  between 
the  mean-locations  of  corresponding  trackers  are  then  es¬ 
timated.  Note  that  the  disparity  of  a  scene  point  on  a  plane 
is  a  linear  function  with  respect  to  its  x  and  y  coordinates 
on  the  image.  The  linear  fit  is  strong  with  a  correlation  co¬ 
efficient  of  0.976  and  is  significantly  better  than  computing 
disparities  on  a  per- frame  basis,  as  shown  in  Fig.  13. 

8.  Comparisons  with  Depth-from-X 

This  work  introduces  turbulence  as  a  new  cue  for  scene 
depth.  So,  it  is  instructive  to  discuss  the  parallels  and  differ¬ 
ences  between  depth  from  turbulence  and  other  depth-from- 
X  approaches  in  computer  vision. 

Structure  from  motion  (SFM):  SFM  relies  on  estimating 
the  pixel  disparity  across  different  views  of  the  scene  that 
reduces  with  depth.  In  the  case  of  turbulence,  variance  of 
projected  scene  point  is  measured  from  the  same  camera 
position  over  time  and  monotonically  increases  with  scene 
distance.  Thus,  while  SFM  is  suited  for  shorter  distances 
(for  a  fixed  baseline),  depth  from  turbulence  is  better  in  gen¬ 
eral  for  a  longer  range  and/or  stronger  turbulence.  But  if 
the  scene  is  outside  the  turbulence  region,  the  depth  preci¬ 
sion  degrades  in  the  asymptotic  region  of  the  variance  curve 
(Fig.  4).  At  the  same  time,  both  approaches  share  the  same 
issues  with  finding  and  tracking  corresponding  features. 

Depth  from  defocus  or  diffusion  (DFD):  In  both  cases,  the 
point-spread  function  varies  across  the  scene.  The  extent 
of  the  observed  blur  monotonically  increases  with  distance 
from  the  sensor  in  the  case  of  turbulence  and  distance  from 
the  focal/diffuser  plane  in  the  case  of  DFD  [7,  25].  Depth 
from  turbulence  requires  capture  of  a  temporal  sequence  of 


images,  and  is  similar  to  moving  a  pinhole  across  the  aper¬ 
ture  of  the  lens  to  compute  depth  [1]. 

Structure  from  bad  weather:  Perhaps  depth  from  fog 
or  haze  [14]  is  most  similar  in  spirit  to  depth  from  turbu¬ 
lence.  These  approaches  also  use  a  single  viewpoint,  pro¬ 
vide  measures  that  are  more  reliable  for  scenes  with  long 
distances,  and  are  (mostly)  independent  of  the  scene  re¬ 
flectance  properties.  That  said,  there  are  also  fundamental 
differences.  Turbulence  is  a  statistical  and  temporally  vary¬ 
ing  phenomenon,  where  depth  cues  are  due  to  phase  varia¬ 
tions  of  the  incident  light  rather  than  the  intensity  variations 
as  in  fog.  The  environmental  illumination  (air  light)  pro¬ 
vides  a  strong  cue  for  depth  from  fog,  whereas  the  specific 
illumination  geometry  of  the  environment  plays  little  or  no 
part  in  depth  from  turbulence. 

9.  Conclusion  and  Future  Work 

This  article  is  an  initial  attempt  at  understanding  what 
depth  cues  can  be  extracted  from  optical  turbulence.  We 
derived  a  simple  relation  between  scene  depths  and  vari¬ 
ance  of  the  projected  scene  points  under  turbulence.  Our 
experiments  showed,  somewhat  surprisingly,  that  accurate 
depth  cues  can  be  obtained  from  optical  turbulence.  There 
are  several  avenues  of  future  work  including  dense  scene 
reconstruction  and  image  super-resolution  from  the  image 
sequence  under  turbulence.  We  wish  to  also  study  several 
other  related  physical  phenomena.  The  twinkling  of  stars 
is  caused  primarily  due  to  the  changes  in  amplitude  of  the 
incident  wave  that  are  distance-related  as  well.  Aside  from 
temperature  gradients,  the  chaotic  movement  of  a  medium 
itself  can  cause  turbulence.  This  type  of  phenomenon  can 
occur  due  to  under  water  currents,  due  to  strong  wind  flow 
in  the  upper  atmosphere,  and  due  to  air  flow  around  engines. 
We  wish  to  build  upon  this  work  to  apply  to  these  scenarios. 
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